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Abstract 

A standard assumption in machine learning 
is the exchangeability of data, which is equiv- 
alent to assuming that the examples are gen- 
erated from the same probability distribution 
independently This paper is devoted to test- 
ing the assumption of exchangeability on-line: 
the examples arrive one by one, and after re- 
ceiving each example we would like to have a 
valid measure of the degree to which the as- 
sumption of exchangeability has been falsified. 
Such measures are provided by exchangeabil- 
ity martingales. We extend known techniques 
for constructing exchangeability martingales 
and show that our new method is competi- 
tive with the martingales introduced before. 
Finally we investigate the performance of our 
testing method on two benchmark datasets, 
USPS and Statlog Satellite data; for the for- 
mer, the known techniques give satisfactory 
results, but for the latter our new more flexi- 
ble method becomes necessary. 



1. Introduction 

Many machine learning algorithms have been devel- 
oped to deal with real-life high dimensional data. In 
order to state and prove properties of such algorithms 
it is standard to assume that the data satisfy the ex- 
changeability assumption (although some algorithms 
make different assumptions or, in the case of prediction 
with expert advice, do not make any statistical assump- 
tions at all). These properties can be violated if the 
assumption is not satisfied, which makes it important 
to test the data for satisfying it. 
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Note that the popular assumption that the data is 
i.i.d. (independent and identically distributed) has the 
same meaning for testing as the exchangeability as- 
sumption. A joint distribution of an infinite sequence 
of examples is exchangeable if it is invariant w.r. to any 
permutation of examples. Hence if the data is i.i.d., its 
distribution is exchangeable. On the other hand, by de 
Finetti's theorem (see, e.g., Schervish, 1995, p. 28) any 
exchangeable distribution on the data (a potentially 
infinite sequence of examples) is a mixture of distribu- 
tions under which the data is i.i.d. Therefore, testing 
for exchangeability is equivalent to testing for being 
i.i.d. 

Traditional statistical approaches to testing are inap- 
propriate for high dimensional data (see, e.g., Vapnik, 
1998, pp. 6-7). To address this challenge a previous 
study (Vovk et al., 2003) suggested a way of on-line 
testing by employing the theory of conformal prediction 
and calculating exchangeability martingales. Basically 
testing proceeds in two steps. The first step is im- 
plemented by a conformal predictor that outputs a 
sequence of p- values. The sequence is generated in the 
on-line mode: examples are presented one by one and 
for each new example a p- value is calculated from this 
and all the previous examples. For the second step the 
authors introduced exchangeability martingales that 
are functions of the p-values and track the deviation 
from the assumption. Once the martingale grows up 
to a large value (20 and 100 are convenient rules of 
thumb) the exchangeability assumption can be rejected 
for the data. 

This paper proposes a new way of constructing mar- 
tingales in the second step of testing. To construct an 
exchangeability martingale based on the sequence of 
p-values we need a betting function, which determines 
the contribution of a p-value to the value of the martin- 
gale. In contrast to the previous studies that use a fixed 
betting function the new martingale tunes its betting 
function to the sequence to detect any deviation from 
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the assumption. We show that this martingale, which 
we call a plug-in martingale, is competitive with all the 
martingales covered by the previous studies; namely, 
asymptotically the former grows faster than the latter. 

1.1. Related work 

The first procedure of testing exchangeability on-line 
is described in Vovk et al. (2003). The core testing 
mechanism is an exchangeability martingale. Exchange- 
ability martingales are constructed using a sequence of 
p- values. The algorithm for generating p- values assigns 
small p- values to unusual examples. It implies the idea 
of designing martingales that would have a large value 
if too many small p- values were generated, and suggests 
corresponding power martingales. Other martingales 
(simple mixture and sleepy jumper) implement more 
complicated strategies, but follow the same idea of 
scoring on small p-values. 

Ho (2005) applies power martingales to the problem 
of change detection in time-varying data streams. The 
author shows that small p-values inflate the martingale 
values and suggests to use the martingale difference as 
another test for the problem. 

1.2. This paper 

To the best of our knowledge, no study has aimed 
to find any other ways of translating p-values into a 
martingale value. In this paper we propose a new 
more flexible method of constructing exchangeability 
martingales for a given sequence of p-values. 

The rest of the paper is organised as follows. Section 
2 gives the definition of exchangeability martingales. 
Section 3 presents the construction of plug-in exchange- 
ability martingales, explains the rationale behind them, 
and compares them to the power martingales, which 
have been used previously. Section 4 shows experimen- 
tal results of testing two real-life datasets for exchange- 
ability; for one of these datasets power martingales 
work satisfactorily and for the other one the greater 
flexibility of plug-in martingales becomes essential. Sec- 
tion 5 summarises the paper. 

2. Exchangeability martingales 

This section outlines necessary definitions and results 
of the previous studies. 

2.1. Exchangeability 

Consider a sequence of random variables {Z\, Z 2 , ■ ■ ■) 
that all take values in the same example space. Then 
the joint probability distribution P(Zi, . . . , Z^) of a 



finite number of the random variables is exchangeable 
if it is invariant under any permutation of the random 
variables. The joint distribution of infinite number of 
random variables \Z\,Zi, ■ ■ .) is exchangeable if the 
marginal distribution P(Z±, . . . , Zn) is exchangeable 
for every N. 

2.2. Martingales for testing 

As in Vovk et al. (2003), the main tool for testing ex- 
changeability on-line is a martingale. The value of the 
martingale reflects the strength of evidence against the 
exchangeability assumption. An exchangeability mar- 
tingale is a sequence of non-negative random variables 
So , Si , . . . that keep the conditional expectation: 

S n > 

S n = E(5 f „+i I Si, ... , S n ), 

where E refers to the expected value with respect to 
any exchangeable distribution on examples. We also 
assume So — 1. Note that we will obtain an equivalent 
definition if we replace "any exchangeable distribution 
on examples" by "any distribution under which the 
examples are i.i.d." (remember the discussion of de 
Finetti's theorem in Section 1). 

To understand the idea behind martingale testing we 
can imagine a game where a player starts from the 
capital of 1, places bets on the outcomes of a sequence 
of events, and never risks bankruptcy. Then a martin- 
gale corresponds to a strategy of the player, and its 
value reflects the acquired capital. According to Ville's 
inequality (see Ville, 1939, p. 100), 

P{3n : S n > C} < 1/C, VC>1, 

it is unlikely for any S n to have a large value. For the 
problem of testing exchangeability, if the final value of a 
martingale is large then the exchangeability assumption 
for the data can be rejected with the corresponding 
probability. 

2.3. On-line calculation of p-values 

Let (zi, z 2 , ■ ■ ■) denote a sequence of examples. Each 
example Zi is the vector representing a set of attributes 
Xi and a label yf. Zj = (xi,yi). In this paper we use 
conformal predictors to generate a sequence of p-values 
that corresponds to the given examples. The general 
idea of conformal prediction is to test how well a new 
example fits to the previously observed examples. For 
this purpose a "nonconformity measure" is defined. 
This is a function that estimates the strangeness of one 
example with respect to others: 

a t = A[Zi,{zi, . . . ,z n }), 
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Algorithm 1 Generating p-values on-line 
Input: (zi, Z2, ■ . .) data for testing 
Output: (pi,f>2, • • ■) sequence of p-values 
for i = 1,2, . . . do 

observe a new example Zi 

for j = 1 to i do 

atj = A(zj,{zi,...,Zi}^ 
end for 

„ _ #{r-<*i><*i}+e^#{3--a ] =<*i} 
Pi — l 

end for 



where in general {. . .} stands for a multiset (the same 
element may be repeated more than once) rather than 
a set. Typically, each example is assigned a "noncon- 
formity score" on based on some prediction method. In 
this paper we deal with the classification problem and 
the 1-Nearest Neighbor (1-NN) algorithm is used as the 
underling method to compute the nonconformity scores. 
The algorithm is simple but it works well enough in 
many cases (see, e.g., Hastie et al., 2001, pp. 422-427). 
A natural way to define the nonconformity score of an 
example is by comparing its distance to the examples 
with the same label to its distance to the examples 
with a different label: 

_ mm j^i:i/i=yj d{Xj,Xj) 
•-•<:</ •'•/ 'I' 

where d(Xi, Xj) is the Euclidean distance. According 
to the chosen nonconformity measure, on is high if the 
example is close to another example with a different 
label and far from any examples with the same label. 

Using the calculated nonconformity scores of all ob- 
served examples, the p-value p n that corresponds to 
an example z n is calculated as 

#{i : a.i > a n } + 0„#{i ■ a, = a n } 

Pn = , 

n 

where 9 n is a random number from [0, 1] and the symbol 
# means the cardinality of a set. Algorithm 1 sum- 
marises the process of on-line calculation of p-values 
(it is clear that it can also be applied to a finite dataset 
(zi, . . . , z n ) producing a finite sequence (p%, . . . ,p n ) of 
p-values). 

The following is a standard result in the theory of 
conformal prediction (see, e.g., Vovk et al. 2003, Theo- 
rem 1). 

Theorem 1. If examples (z 1; z 2 , . . .) (resp. (z±, z%, . . . , 
z n ) ) satisfy the exchangeability assumption, Algorithm 
1 produces p-values (pi,P2, ■ ■ • ) (resp. (pi,P2, ■ ■ ■ ,Pn)) 
that are independent and uniformly distributed in [0, 1]. 



The property that the examples generated by an ex- 
changeable distribution provide uniformly and indepen- 
dently distributed p-values allows us to test exchange- 
ability by calculating martingales as functions of the 
p-values. 

3. Martingales based on p-values 

This section focuses on the second part of testing: given 
the sequence of p-values a martingale is calculated as 
a function of the p-values. 

For each i G {1,2,...}, let : [0,1]' ->■ [0,oo). Let 
{pi,P2, ■ ■ ■) be the sequence of p-values generated by 
Algorithm 1. We consider martingales S n of the form 

n 

S n =l[fi(pi), n = l,2,..., (2) 

i=i 

where we denote fi(p) = /<(pi, • ■ • ,Pi-i,p) and call the 
function fi(p) a betting function. 

To be sure that (2) is indeed a martingale we need the 
following constraint on the betting functions ff. 

f fi(p)dp=l, i = l,2,... 
Jo 

Then we can check: 

E(S n+1 | So, ... , S n ) = / H(fi(Pi))fn+l(p)dp 

n „i n 

= T[(fiiPi)) / fn+l(p)d P = T\fi(pi) = S n . 

l= i 7 J o i=l 

Using representation (2) we can update the martin- 
gale on-line: having calculated a p-value pi for a new 
example in Algorithm 1 the current martingale value 
becomes Si — Si-i ■ fi(pi). To define the martingales 
completely we need to describe the betting functions 

u 

3.1. Previous results: power and simple 
mixture martingales 

Previous studies (Vovk et al., 2003) have proposed to 
use a fixed betting function 

Vz: f i (p)=ep s -\ 

where e E [0,1]. Several martingales were constructed 
using the function. The power martingale for some e, 
denoted as M^, is defined as 

n 
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Figure 1. The betting functions that are used to construct 
the power and simple mixture martingales. 

The simple mixture martingale, denoted as M n , is the 
mixture of power martingales over different e € [0, 1]: 

M n = f M e n de. 
Jo 

Such a martingale will grow only if there are many 
small p-values in the sequence. This follows from the 
shape of the betting functions: see Figure 1. If the 
generated p-values concentrate in any other part of 
the unit interval, we cannot expect the martingale to 
grow. So it might be difficult to reject the assumption 
of exchangeability for such sequences. 

3.2. New plug- in approach 

3.2.1. Plug-in martingale 

Let us use an estimated probability density function as 
the betting function /, (p). At each step the probability 
density function is estimated using the accumulated 
p-values: 

Piip) =p(pi,---,Pi-i>P), (3) 

where p(pi, ■ ■ ■ ,Pi-\,p) is the estimate of the proba- 
bility density function using the p-values Pi,. ■■ ,Pi-i 
output by Algorithm 1. 

Substituting these betting functions into (2) we get a 
new martingale that we call a plug-in martingale. The 
martingale avoids betting if the p-values are distributed 
uniformly, but if there is any peak it will be used for 
betting. 

Estimating a probability density function. In 

our experiments we have used the statistical environ- 
ment and language R. The density function in its 



Stats package implements kernel density estimation 
with different parameters. But since p-values always lie 
in the unit interval, the standard methods of kernel den- 
sity estimation lead to poor results for the points that 
are near the boundary. To get better results for the 
boundary points the sequence of p-values is reflected to 
the left from zero and to the right from one. Then the 
kernel density estimate is calculated using the extended 
sample U™ =1 {— Pi,Pi, 2 — Pi}. The estimated density 
function is set to zero outside the unit interval and 
then normalised to integrate to one. For the results 
presented in this paper the parameters used are the 
Gaussian kernel and Silverman's "rule of thumb" for 
bandwidth selection. Other settings have been tried 
as well, but the results are comparable and lead to the 
same conclusions. 

The values S n of the plug-in martingale can be updated 
recursively. Suppose computing the nonconformity 
scores (a%, . . . , a n ) from (z±, . . . , z n ) takes time g(n) 
and evaluating (3) takes time h(n). Then updating 
S n —x to S n takes time 0(g(n) + n + h(n)): indeed, it 
is easy to see that calculating the rank of a n in the 
multiset {a\, . . . , a n } takes time 0(n). 

The performance of the plug-in martingale on real-life 
datasets will be presented in Section 4. The rest of 
the current section proves that the plug-in martingale 
provides asymptotically a better growth rate than any 
martingale with a fixed betting function. To prove 
this asymptotical property of the plug-in martingale 
we need the following assumptions. 

3.2.2. Assumptions 

Consider an infinite sequence of p-values (pi,P2, ■ ■ •)■ 
(This is simply a deterministic sequence.) For its finite 
prefix (pi, . . . ,p n ) define the corresponding empirical 
probability measure P„: for a Borel set A in R, 

P n (A)=* {l = 1 --- n:p ^ A} . 

n 

We say that the sequence (pi,P2, ■ • •) is stable if there 
exists a probability measure P on R such that: 



71— >OC 

2. there exists a positive continuous density function 
p(p) for P: for any Borel set A in R, P(A) = 

Sa P(p) d P- 

Intuitively, the stability means that asymptotically 
the sequence of p-values can be described well by a 
probability distribution. 
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Consider a sequence (/i(p), /a(p)> ■ ■ •) of betting func- 
tions. (This is simply a deterministic sequence of func- 
tions /, : [0, 1] — > [0, oo), although we are particularly 
interested in the functions fi(p) — Pi(p), as defined 
in (3).) We say that this sequence is consistent for 
(Pi,P2, ■ ■ ■) if 

l<Wn(p)) >log(p(p)). 

n— >oo 

Intuitively, consistency is an assumption about the 
algorithm that we use to estimate the function p(p); in 
the limit we want a good approximation. 

3.2.3. Growth rate of plug-in martingale 

The following result says that, under our assumptions, 
the logarithmic growth rate of the plug-in martingale is 
better than that of any martingale with a fixed betting 
function (remember that by a betting function we mean 
any function mapping [0, 1] to [0,oo)). 

Theorem 2. If a sequence (pi,p2, ■ ■ ■) €E [0, 1]°° is sta- 
ble and a sequence of betting functions (fi{p), f2{p), ■ ■ •) 
is consistent for it then, for any positive continuous 
betting function f, 

I n n \ 

- E lo s(^^)) - - EM/o*)) ^ 

™ 00 \ 1=1 1=1 / 

First we explain the meaning of Theorem 2 and then 
prove it. According to representation (2) after n steps 
the martingale grows to 



n/*(p*)- 



(4) 



i=l 



Note that if for any p- value p £ [0, 1] we have fi(p) = 
then the martingale can become zero and will never 
change after that. Therefore, it is reasonable to consider 
positive fi(p). Then we can rewrite product (4) as sum 
of logarithms, which gives us the logarithmic growth 
of the martingale: 



n 



(=1 

We assume that the sequence of p- values is stable and 
the sequence of estimated probability density functions 
that is used to construct the plug-in martingale is 
consistent. Then the limit inequality from Theorem 2 
states that the logarithmic growth rate of the plug-in 
martingale is asymptotically at least as high as that 
of any martingale with a fixed betting function (which 
have been suggested in previous studies). 

To prove Theorem 2 we will use the following lemma. 



Lemma 1. For any probability density functions p and 
f (so that J* p(p)dp = 1 and J Q f(p)dp = I), 



.1 

\og[p{p))p{p)dp> / log(/(p)) p(p)dp 



Proof of Lemma 1. It is well known (Kullback, 1959, 
p. 14) that the Kullback-Leibler divergence is always 
non-negative: 



This is equivalent to the inequality asserted by 
Lemma 1. □ 

Proof of Theorem 2. Suppose that, contrary to the 
statement of Theorem 2, there exists 5 > such that 

/ n n \ 

liminf ~y>g(/i(pi)) - - y>g(/(p0) < 



i=l 



i=l 



(5) 

Then choose an e satisfying < e < (5/4. 

Substituting the definition of p{p) into Lemma 1 we 
obtain 



lO) 



'{pip)) 



dP> I loj 

o 



{m) 



dp. 



(6) 



From the stability of (pi,£>2> • • •) it follows that there 
exists a number N\ — N\(e) such that, for all n > Ni, 



/.i 

log(/(p))dP n - / log(/(p))dP 



and 



< e 



< e. 



\og(p{jp))dP n - J Iog(p(p))dP 
Then inequality (6) implies that, for all n > N%, 

j log(p(p))dP n >^ log(/(p))dP Tl -2e. 

By the definition of the probability measure P„, the 
last inequality is the same thing as 

1 n 1 n 

-^iog( P ( Pl ))>-E io g(^))- 2e - ^ 



By the consistency of (fi{p), f2(p), ■ ■ ■) there exists a 
number iV2 = AT 2 (e) such that, for all i > N2 and all 

pe [0,1], 



log(/i(p))-log(p(p)) 



< £. 



(8) 
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Let us define the number 

M = max|log(/. i (p)) - log(p(p)) |. (9) 

From (8) and (9) we have 

|log(/^))-log(p(p))|<{ f \l N N l (10) 

Denote N 3 = max(Ni,N 2 ). Then, using (10) and (7), 
we obtain, for all n > 7V 3 , 

;E^/.w)>i£^/<«>)-*-^. 

i=l i=l 

Denoting 7V 4 = max(A 3 , ^a), we can rewrite the last 
inequality as 

n 1 n 

-E^/^))>-E^/^))- 4£ > 

i=l i=l 

for all n > N±. Finally, recalling that £ < |, we have, 
for all n > N4, 

1 n 1 n 

-E lo g(^fe))--E lo g(/^))^-^ 

1=1 »=i 

This contradicts (5) and therefore completes the proof 
of Theorem 2. □ 

4. Empirical results 

In this section we investigate the performance of our 
plug-in martingale and compare it with that of the 
simple mixture martingale. Two real-life datasets have 
been tested for exchangeability: the USPS dataset and 
the Statlog Satellite dataset. 

4.1. USPS dataset 

Data The US Postal Service (USPS) dataset consists 
of 7291 training examples and 2007 test examples of 
handwritten digits, from to 9. The data were collected 
from real-life zip codes. Each example is described by 
the 256 attributes representing the pixels for displaying 
a digit on the 16 x 16 gray-scaled image and its label. 
It is well known that the examples in this dataset are 
not perfectly exchangeable (Vovk et al., 2003), and 
any reasonable test should reject exchangeability there. 
In our experiments we merge the training and test 
sets and perform testing for the full dataset of 9298 
examples. 

Figure 2 shows the typical performance of the martin- 
gales when the exchangeability assumption is satisfied 



-A- simple mixture 
■ plug-In 



1 - 1 V w \. 

1000 2000 3000 4000 5000 6000 7000 8000 9000 
index of example 

Figure 2. The growth of the martingales for the USPS 
dataset randomly shuffled before on-line testing. The ex- 
changeability assumption is satisfied: the final martingale 
values are about 0.01. 



for sure: all examples have been randomly shuffled 
before the testing. 

Figure 4 shows the performance of the martingales 
when the examples arrive in the original order: first 
7291 of the training set and then 2007 of the test set. 
The p-values are generated on-line by Algorithm 1 
and the two martingales are calculated from the same 
sequence of p-values. The final value for the simple 
mixture martingale is 2.0 x 10 10 , and the final value 
for the plug-in martingale is 3.9 x 10 8 . 

Figure 6 shows the betting functions that correspond 
to the plug-in martingale and the "best" power mar- 
tingale. For the plug-in martingale, the function is 
the estimated probability density function calculated 
using the whole sequence of p-values. The betting func- 
tion for the family of power martingale corresponds to 
the parameter e* that provides the largest final value 
among all power martingales. It gives a clue why we 
could not see advantages of the new approach for this 
dataset: both martingales grew up to approximately 
the same level. There is not much difference between 
the best betting functions for the old and new meth- 
ods, and the new method suffers because of its greater 
flexibility. 

4.2. Statlog Satellite dataset 

Data The Statlog Satellite dataset (Frank & Asun- 
cion, 2010) consists of 6435 satellite images (divided 
into 4435 training examples and 2000 test examples). 
The examples are 3x3 pixel sub-areas of the satellite 
picture, where each pixel is described by four spectral 
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Figure 3. The growth of the martingales for the Statlog 
Satellite dataset randomly shuffled before on-line testing. 
The exchangeability assumption is satisfied: the final mar- 
tingale values are about 0.01. 



values in different spectral bands. Each example is 
represented by 36 attributes and a label indicating 
the classification of the central pixel. Labels are num- 
bers from 1 to 7, excluding 6. The testing results are 
described below. 

Figure 3 shows the performance of the martingales 
for randomly shuffled examples of the dataset. As 
expected, the martingales do not reject the exchange- 
ability assumption there. 

Figure 5 presents the performance of the martingales 
when the examples arrive in the original order. The 
final value for the simple mixture martingale is 5.6 x 10 2 
and the final value for the plug-in martingale is 1.8 x 
10 17 . Again, the corresponding betting functions for the 
plug-in martingale and the "best" power martingale are 
presented in Figure 7. For this dataset the generated 
p-values have a tricky distribution. The family of 
power betting functions ep E ~ l cannot provide a good 
approximation. The power martingales lose on p-values 
close to the second peak of the p-values distribution. 
But the plug-in martingale is more flexible and ends 
up with a much higher final value. 

It can be argued that both methods, old and new, work 
for the Statlog Satellite dataset in the sense of rejecting 
the exchangeability assumption at any of the commonly 
used thresholds (such as 20 or 100). However, the 
situation would have been different had the dataset 
consisted of only the first 1000 examples: the final value 
of the simple mixture martingale would have been 0.013 
whereas the final value of the plug-in martingale would 
have been 3.74 x 10 15 . 




1000 2000 3000 4000 5000 6000 7000 8000 9000 
index of example 



Figure 4. The growth of the martingales for the USPS 
dataset. For the examples in the original order the ex- 
changeability assumption is rejected: the final martingale 
values are greater than 3.8 x 10 8 . 

5. Discussion and conclusions 

In this paper we have introduced a new way of con- 
structing martingales for testing exchangeability on- 
line. We have shown that for stable sequences of 
p-values the new more adaptive martingale provides 
asymptotically the best result compared with any other 
martingale with a fixed betting function. The experi- 
ments of testing two real-life datasets have been pre- 
sented. Using the same sequence of p-values the plug-in 
martingale extracts approximately the same amount or 
more information about the data-generating distribu- 
tion as compared to the previously introduced power 
martingales. 

Remark. The previous studies were based on the nat- 
ural idea that lack of exchangeability leads to new 
examples looking strange as compared to the old ones 
and therefore to small p-values (for example, if the data- 
generating mechanism changes its regime and starts 
producing a different kind of examples). This is, how- 
ever, a situation where lack of exchangeability makes 
the p-values cluster around 1: we observe examples 
that are ideal shapes of several kinds distorted by ran- 
dom noise, and the amount of noise decreases with 
time. Predicting the kind of a new example using the 
nonconformity measure (1) will then tend to produce 
large p-values. 

Our goal has been to find an exchangeability martin- 
gale that does not need any assumptions about the 
p-values generated by the method of conformal predic- 
tion. Our proposed martingale adapts to the unknown 
distribution of the p-values by estimating a good bet- 
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Figure 5. The growth of the martingales for the Statlog 
Satellite dataset. For the examples in the original order the 
exchangeability assumption is rejected: the final value of 
the simple mixture martingale is 5.6 x 10 2 , and the final 
value of the plug-in martingale is 1.8 x 10 17 . 



Figure 6. The betting functions for testing the USPS 
dataset for examples in the original order. 



ting function from the past data. This is an example of 
the plug-in approach. It is generally believed that the 
Bayesian approach is more efficient than the plug-in 
approach (see, e.g., Bernardo & Smith, 2000, p. 483). 
In our present context, the Bayesian approach would 
involve choosing a prior distribution on the betting 
functions and integrating the exchangeability martin- 
gales corresponding to these betting functions over the 
prior distribution. It is not clear yet whether this can 
be done efficiently and, if yes, whether this can improve 
the performance of exchangeability martingales. 
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